Dynamic power reduction

ABSTRACT

Some embodiments of the invention include systems, apparatuses, and methods for dynamically reducing requested supply voltage based on idle functional blocks.

BACKGROUND

1. Technical Field

Some embodiments of the present invention generally relate to powermanagement techniques. In particular, some embodiments relate to powermanagement through dynamic supply voltage reduction.

2. Discussion

As the trend toward advanced processors with more transistors and higherfrequencies continues to grow, computer designers and manufacturers areoften faced with corresponding increases in power consumption. Withoutpower management, integrated circuits (ICs) such as processors withmultiple cores can consume excessive power. Accordingly, new powermanagement approaches are desired.

BRIEF DESCRIPTION OF THE DRAWINGS

Various advantages of embodiments of the present invention will becomeapparent to one skilled in the art by reading the followingspecification and appended claims, and by referencing the followingdrawings, in which:

FIG. 1 is a block diagram of an integrated circuit with voltageselection logic (VSL) according to some embodiments of the invention;

FIG. 2 is a flow diagram of a voltage selection routine that may beperformed by the VSL according to some embodiments of the invention;

FIG. 3 is a diagram of a multi-core processor with a VSL according tosome embodiments of the invention;

FIG. 4 is a flow diagram of a voltage selection routine for theprocessor of FIG. 3 according to some embodiments of the invention;

FIG. 5 is a flow diagram of a routine to determine load line dropreduction according to some embodiments of the invention;

FIG. 6 is a block diagram of a Voltage selection logic according to someembodiments of the invention;

FIG. 7 is a system-level block diagram of an example computer systemaccording to some embodiments of the invention; and

FIG. 8 is a diagram of a multi-core processor with multiple supplyvoltage domains according to some embodiments of the invention.

DETAILED DESCRIPTION

In accordance with some embodiments, the requested supply voltage from avoltage regulator module (VRM) to an integrated circuit device (such asa processor) can dynamically be reduce when inactivity in the IC isidentified because less voltage will be dropped across the VRM powerdelivery network (load-line). That is, the same or higher supply voltagecan be provided to the chip by a smaller voltage from the VRM when thereis less supply current required from the VRM because it results in asmaller drop across its load-line. This is beneficial, for example,because lower voltages typically result in lower power and improvedreliability. In an exemplary application, a processor with multiplecores may request a reduced supply voltage from its VRM when itrecognizes that one or more of its cores are idle because less currentwill be drawn from the VRM.

FIG. 1 generally shows a IC device 102 coupled to a VRM 104 to receivefrom it a supply voltage V_(C) in response to a requested regulatorvoltage V_(R). The value of the regulator voltage V_(R) is determined byvoltage selection logic (VSL) 108 and communicated to the VRM 104through a control signal (V_(R)CNTL). The IC 102 could be any IC deviceimplementing, for example, a system-on-a-chip (SOC), processor, ASIC,network component, controller, or the like. It has one or morefunctional blocks such as cores or the like that may be active or idle(e.g., having an active clock when active or turned off or substantiallyslowed down clock when idle). The VSL 108 has the ability to determinean amount of reduced load-line drop and/or supply current reduction(which translates to load-line drop) based on how many and/or whichfunctional blocks are idle. (Note that as used herein, the term“determine” or “determining” refers to obtaining a result throughmeasurement, estimation, calculation, derivation, identification, andthe like and is intended to be used in its broadest sense.)

With many applications, depending on desired operating performance,specifications may require that the supply voltage V_(C) be at or abovea minimum level. However, the supply voltage (V_(C)) actually receivedby the IC is smaller than the regulator voltage (V_(R)) due to thevoltage drop over the power delivery network, modeled as the load-lineresistance R_(LL). The load-line voltage drop will be: R_(LL)×I_(C).Therefore, V_(C) is: V_(R)−(I_(c)×R_(LL)). Accordingly, this should beconsidered in order to obtain an acceptable supply voltage V_(C) at theIC.

To meet the V_(C) requirement, traditional approaches, for example,select a regulator voltage (V_(R)) value such that V_(C) will not gobelow the specified value, even when all functional blocks are active.That is, a worse-case supply current (I_(C)) is assumed (all blocksbeing active), and a V_(R) is requested to provide a V_(C) that meetsthis condition. With other known approaches, a VSL might decrement therequested V_(R) by a fixed, “safe” amount in response to functionalblocks being idle, regardless of its present performance state. However,this still fails to consider how much the supply can actually be reducedin view of the particular reduction in the drop across the load-line forparticular operating conditions. Accordingly, with some embodimentsdisclosed herein, the reduction in load-line drop is determined(calculated, estimated, measured, derived and/or identified) based onthe quantity and/or quality of idle blocks, to more optimally reduce therequested V_(R) and at the same time, meet supply voltage requirements.

FIG. 2 generally shows a routine 200 that may be performed by VSL 108 todynamically select reduced regulator voltages V_(R). At 202, itdetermines a received supply voltage specification. At 204, itdetermines one or more idle I_(C) blocks. At 206, it determines arequested supply voltage value based on the specified received value andon a reduction in the load-line drop due to the identified one or moreidle blocks. As will be discussed further below within the context of anexemplary multi-core processor IC, the reduced amount may be determinedin any suitable manner. For example, a ΔV_(R) value could be acquired bylooking it up in a memory structure based on operating conditions andnumber/type of idle blocks. Alternatively, it could be derived frominterpolation of boundary values (e.g., fused into the IC) such asΔV_(R) values or ΔI_(C) values based on operating conditions andnumber/type of idle blocks. For example, a ΔV_(R) could be calculated aΔV based on an estimation of ΔI_(C) (in view of the idle blocks) and aknown value for R_(LL). Various other approaches may be used and arewithin the scope of the invention.

FIG. 3 is a block diagram of a multi-core processor 302 with a VSL torequest a reduced V_(R), based on one or more of its cores being idle,in accordance with some embodiments. Processor 302 has N cores 306 (core0 to core N−1) and a VSL 308 to determine a supply voltage V_(R) to berequested from a VRM 104. In this example, the n cores are on the samepower plane but have independent clock distributions. Furthermore, thecores are assumed to be copies of each other and have similar powercharacteristics.

Processor 302 may operate in different performance states, as determinedby applications being processed. The term “performance state” generallyrefers to an operating level specification for a processor or coreswithin a processor. For example, a common performance statespecification, the Advanced Configuration and Power Interface (ACPI)specification defines different P-states to dictate operating corevoltage and frequency for the different performance states within thespecification. With this specification, P0 is the highest performancestate, while Pn is the lowest performance state.

With some platforms, when starting up, the computing platform basicinput/output system (BIOS) builds a P-state data structure, to provideP-state information to the processor, based on data obtained from theprocessor (for example, from programmed, e.g., fused boundary values).For each performance state, the data structure provides the specifiedoperating supply voltage and frequency. In some embodiments, differentp-states can be requested (e.g., from different operating systemthreads) for each core's process, but control logic will choose the mostactive state and apply it to each core. At the same time, however, thisdoes not necessarily mean that every core will be running at thespecified p-state parameters. there also may be so-called underlyingC-states which may be separately applied to the various cores. So whilea relatively active p-state (e.g., P0 or P1) may be assigned for theoverall processor 302, as designated by their C-states, some cores mayactually be idle (e.g., have turned off or substantially reducedclocks).

The chip supply current (I_(c)) has a dynamic component (I_(CDy)) and astatic component (I_(CS)) such that: I_(C)=I_(CDy)+I_(CS). The dynamiccomponent (I_(CDy)) represents the switching current, while the staticcomponent (I_(CS)) represents the leakage current. A core typicallyconsumes static (leakage) current, regardless of whether or not the coreis active or idle, but its dynamic current depends on its clock. If acore's clock is turned off or substantially reduced, then it canreasonably be assumed that: I_(CDy)=0. In some embodiments, it can alsoreasonably be assumed that the cores 306 have the same (or sufficientlysimilar) power characteristics, i.e., dynamic current consumption for agiven performance state. This fact can be used to determine how much thedynamic component (I_(CDy)) of the overall supply current (I_(C)) willdrop for a given P-state based on the number of cores that are idle.

For a given P-state, a dynamic current per-core value (I_(CDyi)) can bemultiplied by the number of idle cores, i, to obtain the overallreduction in dynamic current (ΔI_(CDy)) and thus, the overall reductionin supply current (I_(C)). (The reduction is relative to a pre-assumedvalue used to define the V_(R) value to meet the V_(C) requirement.)From this, the amount ΔV that the requested VR can be lowered is:ΔI_(CDy)×R_(LL). This ΔV_(R) can be reduced from a higher V_(R) thatotherwise would have been used to meet the specified requirements.

(In exemplary embodiments discussed herein, it is generally assumed thateach core consumes the same amount of dynamic current for a givenperformance state when active. This allows one to estimate the overallcurrent reduction by multiplying the number of idle cores by a per-corecurrent value for a given P-state (performance state). It should berecognized, however, that this assumption is not necessary. For example,separate per-core current values for different types or classes of coresor for each core could be used, and the separate currents could be addedto arrive at an overall supply current reduction.)

FIG. 4 shows a routine 400 for selecting a requested voltage V_(R) fromVRM 104. Routine 400 may be performed by the VSL 308. At 402, itdetermines a predefined V_(R) based on a specified performance state.For example, it could obtain this value from a P-state data structure,either within the processor 302 or off-chip, e.g., in memory used forthe BIOS or operating system. It could even be programmed (e.g., fused)into the processor chip itself. At 404, the number of idle cores 306 aredetermined. (Note that routine actions 402 and 404, as with any routineactions described herein, may be performed in any order unless expresslyindicated to the contrary, or otherwise dictated by the nature of theactions.)

At 406, a reduction ΔV_(R) in the drop across the load-line resistanceis determined. This may be done in various different ways, depending ona processor configuration and particular design concerns. More on thiswill be discussed below. At 408, a VR based on the determined ΔV_(R) isprovided to the VRM.

FIG. 5 shows a routine 406 to determine ΔV_(R) based on a number of idlecores, i, in accordance with some embodiments. At 502, it determines thetotal dynamic capacitance for the chip (C_(Dyn)), frequency F, andsupply voltage V_(C) for the applicable performance state. At 504, itdetermines the amount of reduced supply current (ΔI_(C)) using theformula: ΔI_(C)=(C_(Dyn)·F·V_(C))(i/N), where i is the number of idlecores, and N is the total number of cores. F and V_(C) will typically bedefined in the performance state specification, and C_(Dyn) (for theprocessor chip) may be provided by the chip manufacturer or determinedthrough parameter characterization. It could be programmed into the chipduring manufacturing, or it could be made available from an externalmemory source.

At 506, a value for ΔV_(R) is determined by multiplying the determinedΔI_(C) by R_(LL). As with the other parameters, RLL too could beprogrammed into the chip (burned, loaded as machine code), or it couldbe made available to it from an external memory source.

It should be appreciated that ΔV_(R) could be determined in variousother ways and is not limited to the routine of FIG. 5. For example, asuitable ΔV_(R) value could be retrieved (looked up) based on theP-state and number of idle cores. This might consume a relatively largeamount of memory but could be feasible depending on design concerns andhow it is implemented. For example, max. and min. ΔV_(R) values could beburned or fused in a processor chip, and a data structure containing thedifferent values could be generated and stored in memory, similar to howthe BIOS generates P-state data in some embodiments. Alternatively, forgreater flexibility, ΔI_(C) values (instead of ΔV_(R)) values could beprogrammed into or generated for a table to be retrieved based onparticular operating parameters (e.g., P-state) and the number of idlecores. In this way, a ΔV_(R) value could be determined for any VRM andpower delivery network. The value for RLL for a given implementationcould then be provided to the VSL 308 from a source, e.g., at start-up.For example, it could be stored in a BIOS register or even burned intofirmware at the factory for a particular power network deliveryconfiguration. As will be appreciated, numerous other methods may beimplemented and are within the scope of the claims.

FIG. 6 is a block diagram for a voltage selection logic (VSL) 308 inaccordance with some embodiments. It generally comprises parameterregisters 602 for the various cores (core 0 to core n−1), P-stateresolve logic 604, adder logic 606, multiplexer 608, multiplier logic610, V/F command register 612, and subtraction logic 614, all coupledtogether as shown. (Note that in this embodiment, voltage reductionlogic is incorporated into legacy voltage/frequency logic, tapping intothe VR output from the V/F Command register 612 to provide the reducedV_(R) (V_(R)−ΔV_(R)) request. This is not required though. numerousdifferent ways to modify an existing design or create a new and/orseparate VSL may be employed. Along these lines, the VSL blocks may beimplemented with any combination of circuit elements, logic, and/ormachine code as may be desired for a particular design.)

The parameter registers 602 each receive a P-state identifier for itsassociated core, along with a ΔV_(Ri) (per idle-core) value for therequested P-state. The parameter registers 602 provide their P-states tothe P-state resolve logic 604, which processes the P-state requests forthe cores and selects a P-state to be applied to all of the cores. Forexample, in some embodiments, it selects the most active requestedP-state from the requested P-states. In addition, the parameterregisters 602 provide to adder logic 606 a digital value indicatingwhether or not their associated core is idle. The adder logic 606combines (sums) these values to produce a result to the multiplier logic610 indicating how many cores are idle. Finally, the parameter registers602 provide to multiplexer 608 ΔV_(Ri) information for their requestedP-state. The selected P-state signal from P-state resolve logic 604selects the voltage reduction factor (ΔV_(Ri)) associated with theselected P-state. This value is provided to multiplier 610 andmultiplied by the number of idle cores to obtain a net voltage reductionvalue (ΔV_(R)). This product is then subtracted from the VR valueprovided from the V/F command register 612 and provided to the VRM 104.For example, it may be provided to one or more voltage select pins or toan off-chip interface to be communicated to the VRM.

FIG. 7 is a block diagram of a computer system 700 having voltageselection logic (VSL) 708 to provide a dynamically reducible V_(R)request to the VRM when supply current decreases, in accordance withsome embodiments of the invention. The computer system 700 may be apersonal computer system or corporate computer system such as, forexample, a laptop, notebook or desktop computer system. The computersystem 700 may include one or more processors 702, which may includesub-blocks such as, but not limited to, one or more cores, illustratedby cores 704 (core 1 to core N), and power management logic (PML) 706,which may include VSL 708, which in some embodiments may be implementedas a module which includes equivalent logic, as one of ordinary skill inthe relevant art would appreciate based at least on the teachingsdescribed herein.

The one or more processor(s) 702 may be an Intel® Architecturemicroprocessor. For other embodiments, the processor(s) may be adifferent type of processor such as, for example, a graphics processor,a digital signal processor, an embedded processor, etc. and/or mayimplement a different architecture.

The one or more processors 702 may be operated with one or more clocksources 709 and provided with power from one or more voltage regulatormodules (VRMs) 104. The one or more processors 702 may also communicatewith other levels of memory, such as memory 712. Higher memory hierarchylevels such as system memory (RAM) 718 a and storage 718 b, such as amass storage device which may be included within the system oraccessible by the system, may be accessed via host bus 714 and a chipset 716.

In addition, other functional units such as a graphics interface 720 anda network interface 722, to name just a few, may communicate with theone or more processors 702 via appropriate busses or ports. Otherdevices such as an antenna (not shown) could be coupled to the networkinterface to couple the one or more processors to a wireless network.

Furthermore, one of ordinary skill would recognize that some or all ofthe components shown may be implemented using a different partitioningand/or integration approach, in variation to what is shown in FIG. 7,without departing from the spirit or scope of the embodiment asdescribed.

For some embodiments of the invention, the storage 718 b may storesoftware such as, for example an operating system 724. For oneembodiment, the operating system is a Windows® operating system,available from Microsoft Corporation of Redmond, Wash., that includesfeatures and functionality according to the Advanced Configuration andPower Interface (ACPI) Standard and/or that provides for OperatingSystem-directed Power Management (OSPM). For some embodiments, theoperating system may be a different type of operating system such as,for example, a Linux operating system.

While the system 700 may be a personal computing system, other types ofsystems such as, for example, other types of computers (e.g., handhelds,servers, tablets, web appliances, routers, etc.), wirelesscommunications devices (e.g., cellular phones, cordless phones, pagers,personal digital assistants, etc.), computer-related peripherals (e.g.,printers, scanners, monitors, etc.), entertainment devices (e.g.,televisions, radios, stereos, tape and compact disc players, videocassette recorders, camcorders, digital cameras, MP3 (Motion PictureExperts Group, Audio Layer 3) players, video games, watches, etc.), andthe like are also within the scope of various embodiments. The memorycircuits represented by the various foregoing figures may also be of anytype and may be implemented in any of the above-described systems.

The VSL 708 may operate in cooperation with other features and functionsof the processor(s) 701 such as the power management module 706. Inparticular, the power management module of one embodiment may controlpower management of the processor(s) 701 and/or of the individualcore(s) 704, including transitions between various power states. Wherethe operating system 724 supports ACPI, for example, the VSL 708 maycontrol and track the c-states of the various core(s) and/or thep-states. The power management logic 706 may also store or otherwisehave access to other information to be used in managing the dynamicrequested VRM voltage of one or more embodiments such as, for example,the amount of active memory and/or one or more cores, a minimum cachememory size, timer information, and/or other information stored inregisters or other data stores.

Furthermore, as one of ordinary skill in the relevant arts wouldappreciate the VSL 708 may use additional intermediate states, as wellas larger and/or smaller states, for some embodiments of the invention.

While many specifics of one or more embodiments have been describedabove, it will be appreciated that other approaches for dynamicallyreducing requested supply voltage may be implemented for otherembodiments. For example, while specific power states are mentionedabove, for other embodiments, other power states and/or other factorsmay be considered in determining that an effective requested supplyvoltage is to be increased or decreased.

Further, while a dynamic supply based on idle cores is discussed forchips with a single supplied voltage (e.g., from a VRM) for purposes ofexample, it will be appreciated that a requested supply voltage approachaccording to one or more embodiments may be applied to a different typeof power delivery and/or host integrated circuit chip and/or system.

For example, a processor with multiple cores in multiple supply domains,such as is shown in FIG. 8, could employ supply reduction as taughtherein. Processor 802 comprises N different supply domains 804 _(i),each coupled to an associated VRM 104 _(i) to provide its domain with aseparately controllable supply V_(C) in response to a requested V_(R).Each domain comprises one or more cores 806 _(i) and a VSL 808 _(i) torequest a supply V_(Ri) based on a number of idle cores within itsdomain.

Any reference in this specification to “one embodiment,” “anembodiment,” “example embodiment,” etc., means that a particularfeature, structure, or characteristic described in connection with theembodiment is included in at least one embodiment of the invention. Theappearances of such phrases in various places in the specification arenot necessarily all referring to the same embodiment. Further, when aparticular feature, structure, or characteristic is described inconnection with any embodiment, it is submitted that it is within thepurview of one skilled in the art to affect such feature, structure, orcharacteristic in connection with other ones of the embodiments.Alternative embodiments of the invention also include machine-accessiblemedia containing instructions for performing the operations of theinvention. Such embodiments may also be referred to as program products.Such machine-accessible media may include, without limitation, storagemedia such as floppy disks, hard disks, CD-ROMs, ROM, and RAM, and othertangible arrangements of particles or molecules manufactured or formed,or otherwise detectable by, a machine or device. Instructions may alsobe used in a distributed environment, and may be stored locally and/orremotely for access by single or multi-processor machines.

Furthermore, for ease of understanding, certain method procedures mayhave been delineated as separate procedures; however, these separatelydelineated procedures should not be construed as necessarily orderdependent in their performance. That is, some procedures may be able tobe performed in an alternative ordering or simultaneously, as one ofordinary skill would appreciate based at least on the teachings providedherein.

Embodiments of the present invention may be described in sufficientdetail to enable those skilled in the art to practice the invention.Other embodiments may be utilized, and structural, logical, andintellectual changes may be made without departing from the scope of thepresent invention. Moreover, it is to be understood that variousembodiments of the invention, although different, are not necessarilymutually exclusive. For example, a particular feature, structure, orcharacteristic described in one embodiment may be included within otherembodiments. Accordingly, the detailed description is not to be taken ina limiting sense.

The foregoing embodiments and advantages are merely exemplary and arenot to be construed as limiting the present invention. For instance, thepresent teaching can be readily applied to other types of memories.Those skilled in the art can appreciate from the foregoing descriptionthat the techniques of the embodiments of the invention can beimplemented in a variety of forms. Therefore, while the embodiments ofthis invention have been described in connection with particularexamples thereof, the true scope of the embodiments of the inventionshould not be so limited since other modifications will become apparentto the skilled practitioner upon a study of the drawings, specification,and following claims.

1. An integrated circuit apparatus comprising: one or more functionalblocks that may be idle when the apparatus is operating; and logic todetermine a supply voltage level to be requested from a VRM based on apredefined level, reduced by an amount proportional to a number of theone or more functional blocks that are idle.
 2. The apparatus of claim1, in which the one or more functional blocks are cores in a processor.3. The apparatus of claim 2, in which the predefined level is based on aperformance state for the processor.
 4. The apparatus of claim 3,wherein the predefined level is reduced by an amount determined fromlooking up a current per-core parameter for the performance state. 5.The apparatus of claim 3, wherein the predefined level is reduced by anamount based on determining reduced supply current as a function of ICdynamic capacitance.
 6. The apparatus of claim 3, wherein the predefinedlevel is reduced by an amount determined from looking up a ΔV per-coreparameter for the performance state.
 7. The apparatus of claim 1, inwhich the functional blocks consume substantially the same dynamiccurrent when they are not idle and at the same performance state.
 8. Theapparatus of claim 1, in which the logic and one or more functionalblocks are part of a supply domain, which is one of a plurality ofsupply domains within the apparatus.
 9. A method, comprising:determining a pre-specified supply voltage to be requested from a VRMdetermining how many functional blocks in an operating integratedcircuit are idle; and reducing the pre-specified supply voltage based onhow many functional blocks are idle.
 10. The method of claim 9, furthercomprising providing the reduced supply voltage to the VRM.
 11. Themethod of claim 9, in which the functional blocks are cores in amulti-core processor.
 12. The method of claim 9, comprising determininga per-core dynamic current value and multiplying the value by the numberof idle cores to determine a supply current reduction to determine howmuch to reduce the pre-specified supply voltage.
 13. The method of claim12, comprising determining how much to reduce the supply voltage basedon the reduced supply current and a load-line resistance value.
 14. Themethod of claim 13, in which the load-line resistance value is fusedinto the integrated circuit.
 15. The method of claim 14, in which theload-line resistance value is retrieved from outside of the integratedcircuit.
 16. The method of claim 9, comprising determining a per-coreΔV_(R) value and multiplying the value by the number of idle cores todetermine how much to reduce the pre-specified supply voltage.
 17. Aprocessor, comprising: a plurality of cores to be operable at a selectedone of a number of performance states; and voltage selection logic torequest a supply voltage V_(R) to provide the cores with a receivedsupply voltage V_(C), wherein V_(R) is proportional to how many of thecores are idle when the processor is operating.
 18. The processor ofclaim 17, in which V_(R) is to be determined from reducing apre-specified V_(R), based on the selected performance state, by anamount derived from determining a supply current reduction.
 19. Theprocessor of claim 18, in which the reduction in supply current isdetermined by retrieving a per-core supply current value from aperformance state data structure.
 20. The processor of claim 17, inwhich V_(R) is to be determined from reducing a pre-specified V_(R),based on the selected performance state, by an amount derived fromdetermining a per-core voltage value.
 21. The processor of claim 20, inwhich the per-core voltage value is to be retrieved from a performancestate data structure.
 22. The processor of claim 17, in which V_(R) isto be requested from a VRM.
 23. The processor of claim 22, comprising atleast part of the VRM.
 24. The processor of claim 22, in which theplurality of cores and voltage selection logic are part of a supplydomain that is one of a multiplicity of supply domains of the processor.25. A computer system, comprising: a processor having: a plurality ofcores to be operable at a selected one of a number of performancestates, and voltage selection logic to request a supply voltage V_(R) toprovide the cores with a received supply voltage V_(C), wherein VR isproportional to how many of the cores are idle when the processor isoperating; at least part of a voltage regulator to generate the V_(R)supply to provide V_(C) to the processor; and an antenna to be coupledto the processor to communicatively link it with a wireless network. 26.The computer system of claim 25, in which V_(R) is to be determined fromreducing a pre-specified V_(R), based on the selected performance state,by an amount derived from determining a supply current reduction. 27.The computer system of claim 26, in which the reduction in supplycurrent is determined by retrieving a per-core supply current value froma performance state data structure.
 28. The computer system of claim 25,in which V_(R) is to be determined from reducing a pre-specified V_(R),based on the selected performance state, by an amount derived fromdetermining a per-core voltage value.
 29. The computer system of claim28, in which the per-core voltage value is to be retrieved from aperformance state data structure.